CS 432-Fall 2001
Assignment
6
Transaction Simulation &
Crash Recovery
Due Date: December 4, 23:59pm.
Note: You will not have to write a lot of code for this assignment, but
there are many small details to take care of when writing an ARIES-style
recovery manager. I strongly suggest that you start soon. In addition, it is
very worthwhile to write pseudo-code before you start coding, and to make sure
that you understand all the different components of the assignment. You will do
this assignment in groups of two students.
Introduction
In this assignment you will complete certain parts of a database engine. The
database is modeled very simply, and runs by executing a series of transaction
operations given to it by the user. Each transaction performs a series of reads
and writes on the pages of the database, and can commit or abort at any time.
The simulator keeps a recovery manager module that keeps a log of the
database's activity. At any point in time, the database may crash
(specifically, when it encounters a special command to induce a crash). The
recovery manager then performs a restart to restore the database to a correct
state, using the Aries recovery algorithm.
The Transaction Simulator
The transaction simulator works just like a mini-database. There is a series
of pages stored in a file on disk, accessed through a buffer. Transactions are
modeled as a sequence of reads and writes to the pages of the database. A log
is kept of all changes to database pages, with WAL used to insure that
committed transactions are fully represented by the log as written to stable
storage. Checkpoints can be inserted at will.
This assignment uses an application called MARS which is a GUI for testing
the simulator. The input comes from .mars files that can be found in the folder
Tests. These files can be edited using any simple text editor. The syntax for
commands to the simulator is discussed in Getting
Started with the Simulator.
A breakdown of the major components follows:
- BufMgr - This module operates
much like the buffer you implemented for project 1. When a page is to be
read or written to, a pinPage() request is made, and the page is, if not
already present, loaded into the buffer pool. On request, one or all pages
can be flushed. Pages are released with an unpinPage() call.
The buffer for this project is somewhat small, being intended to
illustrate the role of the buffer in crash recovery.
- Xaction_table - This keeps
track of all the transactions currently active in the database. It also
keeps information on each active transaction, namely the oldest and most
recent log sequence numbers (LSN's) for each transaction.
- The log subsystem:
- log - This is the log
that is used in WAL while the transactions are executing. It provides
simple sequential access to the log files for reading purposes, and can
append new records to the end of the log. The log records are of
uniform size, and the log does not concern itself with the type of
log being written.
- logrecord - This
represents a single log entry. A special field is kept in each record to
identify the variety of log record (commit, update, etc.) that it is. It
contains a generic data buffer, which holds the information specific to
each type of log record.
- masterlog - This keeps
track of the checkpointing, so that the recovery process will not need to
go back too far to reconstruct the database at the time of the crash.
- LogData structures - These
classes represent each type of update, and the information associated with
it like prevLSN, xaction_id, and the page affected. Each Log Data
structure is stored in the data field of a log record, and should be
accessed by typecasting that data to the appropriate LogData type (UPD,
CLR, ABORT).
- Recovery Manager - consists
of several related modules for logging & recovery procedures. While
the database is running, each transaction has its own recovery manager
that is responsible for logging its actions. Only one recovery module,
though, is necessary for performing crash recovery. The pieces of the
manager are:
- The logging
functionality (logfunc.cpp) - This translates write requests by
transactions into Update records that are written to the log, and creates
CLR records whenever a process aborts on its own.
- rollback.cpp - This
performs a rollback on a process that has just aborted on its own,
undoing the changes and making sure that none of them persist even after
a crash.
- restart.cpp - This is
the part responsible for bringing the database up to a consistent state
following a crash. It performs the three phases of the Aries recovery
algorithm based on the information written to the log.
- Checkpoint - This
generates checkpoints and writes them to the log, extracting the
information from the Xaction table and getting the Dirty Page Table from
the Buffer.
- Recovery
DirtyPageTable - this is the list of possibly dirty pages that is built
during the analysis phase of recovery.
- Recovery XactTable -
this is the list of active transactions that is built during the analysis
phase of recovery.
The recmgr_tab.c module is responsible for parsing the input. It is computer
generated, and not very fit for modification. Don't worry too much about what
it does; just know that it takes one command at a time from the standard input
(which we might have redirected to point to a file) and converts it into a
database operation.
The handle.cpp module contains the standalone functions for performing the
commands. There's one for each operation, and are the "top-level"
functions that are first called when an operation is done. The functions in
handle.cpp are the functions that will call the Recovery Manager functions that
you will be finishing, and will pass in the relevant data about the operations.
The complete code is available in the \\goose\cs432-fall01\a6
and it consists of the complete project file. You can open it by double
clicking the Mars.dsw file. All you need to do is fill in gaps inside two
source files (logfunc.cpp and restart.cpp) in the project.
When you have included your modification and are ready to test your code,
you can run the Mars.exe file generated by visual studio when you compile the
project. When you are generating the Mars.exe file it is recommended that you
build the release version of the project. This can be done by opening the Build
menu, selecting "Set Active Configuration", and choose
"Release". The .exe file can then be found in the "Release"
folder. Simply double click it to start the transaction manager.
Your Task
You will implement various pieces of the recovery module. Specifically, you
will implement the functionality to handle the most basic and common of
transactions, the write (WriteUpdateLog()). You must add code to the logging
mechanism so that writes (also called updates) to the pages of the DB
are reflected in the log, and then implement those parts of the restart
mechanism that deal specifically with those log entries to restore the database
to a consistent state.
The code you will write belongs in these modules:
- logfunc.cpp - You should
complete this function:
- WriteUpdateLog() -
Given the information about a given update, generate an update log
record, and update all affected information in the rest of the database.
- restart.cpp - You should
complete these four main functions:
- restart_analysis -
This scans the log record forward from the last checkpoint, building up
information about the database at the time of the crash. Fill in the code
executed when the record being looked at is an update record, updating
the Recovery Xaction table and Recovery Dirty Page Table being rebuilt.
- redo_update - This is
the function for redoing a single action. Add the code for redoing an
update (UPD) record, extracting the necessary information from the log
record data and performing the update.
- restart_redo - This is
the second phase of recovery, and restores the database to its pre-crash
state. You should implement the code that handles each update record,
calling redo_update if the action needs to be retaken. You should
consider the pageLSN stored with each page to determine the necessity of
repeating the action, and update the Recovery Dirty Page Table where
necesary.
- restart_undo - This
third phase of recovery aborts all transactions active at the time of the
crash, scanning the log backwards and undoing the actions of that
transaction. Implement the code that handles the case for undoing an
update record. You'll need to work with the Recovery Xaction table, and
you should generate a CLR to be written to the log.
Hint: This is essentially a rollback of the transaction, so look at
the code that normally handles a rollback when a transaction aborts.
- You should also finish
three small methods concerning the control of the recovery process:
- findRedoLsn() -
identifying the earliest lsn from which the redo process should start.
- keepPerformingUndo() -
determining when to stop the undo process.
- findNextUndoLsn() -
identifying the next log record to be undone in the undo process.
Reference
The following pages provide more detailed explanations about the classes and
types that will be useful for this assignment.
Minor Bugs
- When you run Mars with a new
test, it may not produce values for the read commands (i.e. - read 1 4;
will display 'read returned 0'). This is usually characterized by ALL the
read commands returning 0. Close Mars and run it again by choosing the
test you want to load from the Transactions menu's recent test files a la
Word, Excel, etc, and it should work. This bug should not hamper your work
in any way but we're working to fix it.
Tips
- If you want to print out
something other than a log record (which would use PrintLogRec ( )), use
the function WriteLogOutput ( char * ). In some files, it may have to be
extern'ed before it can be used.
As an example:
extern void WriteLogOutput(char *);
...
WriteLogOutput( "now entering function WriteUpdateLog( )" );
...
char s[30];
sprintf( s, "the LSN of the record just written was %d",
lsn.GetOffset( ) );
WriteLogOutput( s );
- But for more efficient
debugging, use the VC++ debugger. Make sure you are in the Debug version
and not the Release version. Go under the menu Build -> Set Active
Configuration... and select Mars -> Win32 Debug.
Submission procedure
How and what to hand in:
- You are a group of two
students. Drop your project file, source code, executable, and
documentation into one of your handin directories in \\goose\courses\cs432-fa01\HandinA6.
So if students with netids "abc1" and "xyz2" form a
group together, only one of the two handin directories (for example
the directory \\goose\courses\cs432-fa01\HandinA6\abc1)
will contain the final project of the group.
- Into this folder, drop your
project file, source code, executable, and documentation. Make sure
that the executable is ready to run, and that your whole
project can be recompiled as needed. The TAs should be able to
double-click on your workspace and then do a build all without problems.
- The output, with debugging
enabled, of your code when run on the tests that demonstrate the full range
of functionality. The output from Test 7, 8, and 9 would be suitable. Be
sure include enough detailed output so that I can see the steps taken by
the recovery manager to restore the database after a crash. The debugging
code already written (mainly PrintLogRec) and included in the code should
be sufficient for this purpose.
- An explanation of your code,
including any assumptions made, and any deviations from the standard Aries
recovery scheme given in the text. Include some comments on the design of
the recovery manager, including problems you saw and ways to improve the
code.
Keep a copy of the project in your own account just in case.
Grading
- 70% Correctness
- 20% Documentation
- 10% Coding Style